Introduction to raw accelerometry data

Above sensor images are sourced from respective vendors’ websites.

Above sensor images are sourced from respective vendors’ websites.

movie

Class project raw accelerometry data set

Data set description

Overview

Raw accelerometry data collected during outdoor walking, stair climbing, and driving for 32 healthy adults.

  • Accelerometry data were collected simultaneously at four body locations: left wrist, left hip, left ankle, and right ankle, at a sampling frequency of 100Hz (100 observations per second).
  • The 3-axial ActiGraph GT3X+ devices were used to collect the data.
  • The data include labels of activity type performed (walking, descending stairs, ascending stairs, driving, clapping) for each time point of data collection.
  • Participants’ demographic information is also included.
  • All data are anonymized.

Background

  • Data were collected as a part of the study on Identification of Walking, Stair Climbing, and Driving Using Wearable Accelerometers, sponsored by the Indiana University CTSI grant and conducted at the Department of Biostatistics, Fairbanks School of Public Health at Indiana University.
  • The study was led by Dr. Jaroslaw Harezlak, assisted by Drs. William Fadel and Jacek Urbanek.
  • The study was approved by the Institutional Review Board of Indiana University; all participants provided written informed consent.

Data collection methods

The Methods description provided below is sourced from Reference [1] (Section: “2 Data Collection”) except from one note referenced elsewhere.

Study participants

  • 32 healthy adults (13 men and 19 women) between 23 and 52 years of age participated in the study.
  • 31 of the participants were right-handed.

Devices setup and placement

  • Participants wore four 3-axial ActiGraph GT3X+ accelerometers: one on the left ankle, one on the right ankle, one on the left hip, and one on the left wrist.
  • All four devices were synchronized to the same external clock. (In theory, this procedure should assure parallel measurement among devices; however, a subsecond-level desynchronization of the devices could have happened over time, see [3]: “3.8 Synchronization of Multiple PA Monitors”.)
  • For each device, the data collection frequency was set to 100 Hz (100 observations per second).
  • Each device was attached using velcro bands. The ankle accelerometers were worn on the outside of the ankles. The wrist accelerometer was worn similar to a regular watch on the top of the left wrist. The hip accelerometer was attached to the belt of the participant on the left hip, but when a belt was not available, the device was either attached to the corresponding belt loop or clipped to the waistband.

Walking and driving trials

  • The study included a walking trial (approximately 0.66 miles) followed by a driving trial (approximately 12.8 miles).
  • The walking trial consisted of five periods of walking on level ground, six periods of descending stairs, and six periods of ascending stairs, and lasted a total between 9.0 and 13.5 min. The data collection protocol requested participants to walk at their usual pace along a predefined course to simulate free-living activities. The data from one participant included an additional period of walking on the level ground due to the participant briefly forgetting the instructions before turning around to ascend the stairs.
  • Immediately after the walking period, participants were accompanied to their vehicle, and they then drove on a predefined route that included both highway and city driving. The driving trial lasted between 18 and 30 min, depending on traffic.
  • In order to ensure accuracy of the starting and stopping times for different activities, participants were asked to clap three times at the beginning and end of each activity internally marking the raw accelerometry data for the wrist with three consecutive spikes in the signal. Those internal markings were used to accurately assign activity labels for each section of the protocol. Once the activity labels were assigned, the clapping signal +/-0.5 s of data was deleted to mimic smooth transitions between activities. Data were downloaded immediately following each participant’s session.

Database files guide

Database files contain:

  1. raw_accelerometry_data - a directory with 32 data files in CSV format. Each file corresponds to raw accelerometry data measurements of 1 study participant. File names follow the convention: "subj_id.csv". Each file contains 14 variables (column name in italics):
  • activity - Type of activity (1=walking; 2=descending stairs; 3=ascending stairs; 4=driving; 77=clapping; 99=non-study activity)
  • time_s - Time from device initiation (seconds [s])
  • lw_x - Left wrist x-axis measurement (gravitation acceleration [g])
  • lw_y - Left wrist y-axis measurement (gravitation acceleration [g])
  • lw_z - Left wrist z-axis measurement (gravitation acceleration [g])
  • lh_x - Left hip x-axis measurement (gravitation acceleration [g])
  • lh_y - Left hip y-axis measurement (gravitation acceleration [g])
  • lh_z - Left hip z-axis measurement (gravitation acceleration [g])
  • la_x - Left ankle x-axis measurement (gravitation acceleration [g])
  • la_y - Left ankle y-axis measurement (gravitation acceleration [g])
  • la_z - Left ankle z-axis measurement (gravitation acceleration [g])
  • ra_x - Right ankle x-axis measurement (gravitation acceleration [g])
  • ra_y - Right ankle y-axis measurement (gravitation acceleration [g])
  • ra_z - Right ankle z-axis measurement (gravitation acceleration [g])
  1. raw_accelerometry_data_dict.csv - a CSV file containing the description of 14 variables that each file in the raw_accelerometry_data directory consists of.

  2. participant_demog.csv - a CSV file with participants demographic information. The file contains 7 variables (column name in italics):

  • subj_id - Participant ID (a character scalar). The value in this column can be matched with a file name (without “.csv” extension) of a file in raw_accelerometry_data directory.
  • gender - Participant gender (a character scalar; one of: “male”, “female”).
  • age - Participant age (an integer scalar).
  • height_in - Participant height (an integer scalar; expressed in inches).
  • weight_lbs - Participant weight (an integer scalar; expressed in pounds).
  • race - Participant race (a character scalar; one of: “asian”, “black”, “caucasian”).
  • right_handed - Participant handedness (an integer scalar; 1 if right-handed, 0 otherwise).

All data are anonymized.

Database location

  • (Publishing in progress) PhysioNet - platform that offers free access to large collections of physiological and clinical data and related open-source software. View on PhysioNet:

  • (Available today) Personal Dropbox. Exaclty the same data and data structure as we submitted to PhysioNet, just zipped in one directory. See below instructions for downloading.

Download data set from Dropbox location

library(data.table)
library(magrittr)
library(reshape2)
library(ggplot2)
library(latex2exp)
library(raster)
library(dplyr)
select <- dplyr::select; filter <- dplyr::filter; summarize <- dplyr::summarize
mutate <- dplyr::mutate

## Replace with your directory to the folder where data/IU_walking_driving_climbing.zip
## is/will be downloaded
wd <- "/Users/martakaras/Dropbox/_PROJECTS/WearableComputingShortCourse"
wd_dat.dir <- file.path(wd, "data")

acc_dat.dir <- file.path(wd_dat.dir, "IU_walking_driving_climbing")
acc_dat.zip <- file.path(wd_dat.dir, "IU_walking_driving_climbing.zip")
acc_dat.zip.url <- "https://www.dropbox.com/s/pf5l2ki9t2ae2df/IU_walking_driving_climbing.zip?dl=1"

## Make data dir if does not exist yet
if (!dir.exists(wd_dat.dir)){
  dir.create(wd_dat.dir)
}

## If files not downloaded yet, download (182.6 MB)
if (!dir.exists(acc_dat.dir) & !file.exists(acc_dat.zip)){
  ## Download .zip file
  download.file(acc_dat.zip.url, acc_dat.zip)
  ## Unzip into data/ directory
  unzip(acc_dat.zip, exdir = wd_dat.dir)
  ## Remove .zip file
  file.remove(acc_dat.zip)
}

Read data

Demographic data

Read demographic data of all 32 participants

fpath.demog <- file.path(acc_dat.dir, "participant_demog.csv")
df.demog <-  as.data.frame(fread(fpath.demog))
head(df.demog)
##      subj_id gender age height_in weight_lbs      race right_handed
## 1 idabd0c53c   male  23        72        165 caucasian            1
## 2 id5993bf4a female  45        69        141 caucasian            1
## 3 idd80ac2b4 female  29        73        154 caucasian            1
## 4 id82b9735c   male  43        71        185 caucasian            1
## 5 id8af5374b   male  47        75        238 caucasian            1
## 6 id650857ca   male  34        70        140 caucasian            1

Raw accelerometry data (for 1 participant)

What unzipped IU data dir contains?

list.files(acc_dat.dir)
## [1] "participant_demog.csv"           "raw_accelerometry_data"         
## [3] "raw_accelerometry_data_dict.csv"
list.files(file.path(acc_dat.dir, "raw_accelerometry_data"))
##  [1] "id00b70b13.csv" "id079c763c.csv" "id1165e00c.csv" "id1c7e64ad.csv" "id1f372081.csv"
##  [6] "id34e056c8.csv" "id37a54bbf.csv" "id3e3e50c7.csv" "id4ea159a8.csv" "id5308a7d6.csv"
## [11] "id5993bf4a.csv" "id650857ca.csv" "id687ab496.csv" "id7c20ee7a.csv" "id82b9735c.csv"
## [16] "id86237981.csv" "id8af5374b.csv" "id8e66893c.csv" "id9603e9c3.csv" "ida61e8ddf.csv"
## [21] "idabd0c53c.csv" "idb221f542.csv" "idbae5a811.csv" "idc735fc09.csv" "idc91a49d0.csv"
## [26] "idd80ac2b4.csv" "idecc9265e.csv" "idf1ce9a0f.csv" "idf540d82b.csv" "idf5e3678b.csv"
## [31] "idfc5f05e4.csv" "idff99de96.csv"

Read raw accelerometry data of 1 study participant.

IU_dat.raw.fnames <- list.files(file.path(acc_dat.dir, "raw_accelerometry_data"))

fname.i <- IU_dat.raw.fnames[1]
fpath.i <- file.path(acc_dat.dir, "raw_accelerometry_data", fname.i)
df.i <-  as.data.frame(fread(fpath.i))
head(df.i)
##   activity time_s   lw_x   lw_y   lw_z   lh_x   lh_y  lh_z   la_x  la_y   la_z   ra_x  ra_y  ra_z
## 1       99   0.01  0.039  1.020 -0.020 -0.180  1.234 0.023  0.156 0.855 -0.582  0.887 0.449 0.074
## 2       99   0.02 -0.629 -0.461  0.973 -0.246  0.137 0.969 -0.707 0.559  0.449 -0.027 0.141 0.934
## 3       99   0.03 -0.926 -1.262  0.691  0.238 -0.328 1.219 -1.445 1.367  0.367 -0.164 0.660 1.344
## 4       99   0.04 -0.871 -1.496 -0.246  0.711 -0.484 0.414 -1.660 1.645 -0.543 -0.141 1.027 0.707
## 5       99   0.05 -0.727 -1.621 -0.559  1.031 -0.297 0.145 -1.762 1.676 -0.918 -0.121 1.230 0.438
## 6       99   0.06 -0.543 -1.664 -0.629  1.121 -0.246 0.137 -1.797 1.648 -0.988 -0.051 1.422 0.371

Map activity type (map values were learned from raw_accelerometry_data_dict.csv) file.

map_vec <- rep(NA, 99)
map_vec[c(1,2,3,4,77,99)] <- c("walking", "descending_stairs",  "ascending_stairs",
                               "driving", "clapping", "non_study_activity")
df.i$activity_lab <- map_vec[df.i$activity]
table(df.i$activity_lab)
## 
##   ascending_stairs           clapping  descending_stairs            driving non_study_activity 
##               9125               3544               9766             128537             103888 
##            walking 
##              48440

Duration time of accelerometry data collection.

## Frequency of data collection (sensor setting; number of observations per second)
hz <- 100

## How many minutes of data colleciton total? 
round(nrow(df.i) / (hz * 60), 2)
## [1] 50.55
## How many minutes of data for each activity? 
round(table(df.i$activity_lab) / (hz * 60), 2)
## 
##   ascending_stairs           clapping  descending_stairs            driving non_study_activity 
##               1.52               0.59               1.63              21.42              17.31 
##            walking 
##               8.07

Raw data vizualization

Reshape data.

df.lw <- df.i %>% 
  select(activity_lab, time_s, x = lw_x, y = lw_y, z = lw_z) %>%
  mutate(loc_id = "left_wrist")

df.lh <- df.i %>% 
  select(activity_lab, time_s, x = lh_x, y = lh_y, z = lh_z) %>%
  mutate(loc_id = "left_hip")

df.la <- df.i %>% 
  select(activity_lab, time_s, x = la_x, y = la_y, z = la_z) %>%
  mutate(loc_id = "left_ankle")

df.ra <- df.i %>% 
  select(activity_lab, time_s, x = ra_x, y = ra_y, z = ra_z) %>%
  mutate(loc_id = "right_ankle")

df.all <- rbind(df.lw, df.lh, df.la, df.ra)
rm(df.lw, df.lh, df.la, df.ra, df.i)
head(df.all)
##         activity_lab time_s      x      y      z     loc_id
## 1 non_study_activity   0.01  0.039  1.020 -0.020 left_wrist
## 2 non_study_activity   0.02 -0.629 -0.461  0.973 left_wrist
## 3 non_study_activity   0.03 -0.926 -1.262  0.691 left_wrist
## 4 non_study_activity   0.04 -0.871 -1.496 -0.246 left_wrist
## 5 non_study_activity   0.05 -0.727 -1.621 -0.559 left_wrist
## 6 non_study_activity   0.06 -0.543 -1.664 -0.629 left_wrist

Plot 0-10 seconds of raw accelerometry data from walking, across four sensor locations.

loc_id.level <- c("left_wrist", "left_hip", "left_ankle", "right_ankle")
activity_lab.level <- c("non_study_activity", "driving", "clapping", "walking", 
                        "ascending_stairs", "descending_stairs")
plt.df <- 
  df.all %>% 
  filter(activity_lab == "walking") %>%
  group_by(loc_id) %>%
  mutate(time_s_act = row_number(time_s) / 100) %>%
  filter(time_s_act >= 0, time_s_act <= 10) %>%
  select(-time_s, -activity_lab) %>%
  reshape2::melt(id.vars = c("time_s_act", "loc_id")) %>%
  mutate(loc_id = factor(loc_id, levels = loc_id.level))

ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) + 
  geom_line() + 
  facet_grid(loc_id ~ .) + 
  labs(color = "Sensor\naxis", y = TeX("Acceleration measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")

Plot 0-5 seconds of raw accelerometry data, across four sensor locations, across various activities.

plt.df <- 
  df.all %>% 
  filter(activity_lab != "non_study_activity") %>%
  group_by(loc_id, activity_lab) %>%
  mutate(time_s_act = row_number(time_s) / 100) %>%
  filter(time_s_act >= 0, time_s_act <= 5) %>%
  select(-time_s) %>%
  reshape2::melt(id.vars = c("time_s_act", "loc_id", "activity_lab")) %>%
  mutate(loc_id = factor(loc_id, levels = loc_id.level),
         activity_lab = factor(activity_lab, levels = activity_lab.level))

ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) + 
  geom_line() + 
  facet_grid(loc_id ~ activity_lab) + 
  labs(color = "Sensor\naxis", y = TeX("Acceleration measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")

Some raw data transformation and summaries

Transformation from cartesian to spherical coordinate system

Image source: MathWorks documentation: https://www.mathworks.com/help/matlab/ref/cart2sph.html .

The mapping from three-dimensional Cartesian coordinates \((x,y,z)\) to spherical coordinates \((\text{azimuth},\text{elevation},r)\) is given by:

\[ \begin{aligned} \text{azimuth} & = {\displaystyle \operatorname {arctan2} (y,x)}, \\ \text{elevation} & = {\displaystyle \operatorname {arctan2} (z,\sqrt{x^2 + y^2})}, \\ r & = \sqrt{x^2 + y^2 + z^2}, \end{aligned} \] where \({\displaystyle \operatorname {arctan2}}\) is defined as the angle in the Euclidean plane, given in radians, between the positive \(x\)-axis and the ray to the point \((x,y) \neq (0,0)\).

Note:

  • To compute \((\text{azimuth},\text{elevation},r)\), I use \(raster::atan2\); it handles the “unpleasant” case when \(x\) is numerically zero.
  • I use vm (vector magnitude) column instead of r for 3rd coordinate.
## Compute spherical variables
df.all %<>% 
  mutate(azimuth = atan2(y, x),
         elevation = atan2(z, sqrt(x^2 + y^2)),
         vm = sqrt(x^2 + y^2 + z^2))

## Observe range of values
range(df.all$azimuth)    ## theoretical range: [-pi, pi]
## [1] -3.139973  3.141593
range(df.all$elevation)  ## theoretical range: [-pi/2, pi/2]
## [1] -1.570796  1.558254
range(df.all$vm)         ## theoretical range: [0, inf)
## [1] 0.000000 8.735085

Spherical coordinate system: radius (vector magnitude)

Spherical coordinate \(r\) (radius), aka vector magnitude \((vm)\), is often used to reduce the dimensionality of raw accelerometry time-series \((x,y,z)\). Again, it is computed as \(\sqrt{x^2 + y^2 + z^2}\) at each time point resulting in 1- instead of 3-dimensional time-series.

Plot 0-5 seconds of vector magnitude of accelerometry data, across four sensor locations, across various activities.

plt.df <- 
  df.all %>% 
  filter(activity_lab != "non_study_activity") %>%
  group_by(loc_id, activity_lab) %>%
  mutate(time_s_act = row_number(time_s) / 100) %>%
  filter(time_s_act >= 0, time_s_act <= 5) %>%
  group_by() %>%
  mutate(loc_id = factor(loc_id, levels = loc_id.level),
         activity_lab = factor(activity_lab, levels = activity_lab.level))

ggplot(plt.df, aes(x = time_s_act, y = vm, group = 1)) + 
  geom_line() + 
  facet_grid(loc_id ~ activity_lab) + 
  labs(y = TeX("Vector magnitude measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")

Spherical coordinate system: azimuth, elevation

Plot 0-5 seconds of vector magnitude, azimuth and elevation coordinates of accelerometry data from walking, collected at wrist.

plt.df <- 
  df.all %>% 
  filter(activity_lab == "walking", 
         loc_id == "left_wrist") %>%
  group_by(loc_id, activity_lab) %>%
  mutate(time_s_act = row_number(time_s) / 100) %>%
  filter(time_s_act >= 0, time_s_act <= 10) %>%
  select(time_s_act, loc_id, activity_lab, vm, azimuth, elevation) %>%
  reshape2::melt(id.vars = c("time_s_act", "loc_id", "activity_lab")) %>%
  mutate(loc_id = factor(loc_id, levels = loc_id.level),
         activity_lab = factor(activity_lab, levels = activity_lab.level))

ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) + 
  geom_line() + 
  facet_grid(variable ~ activity_lab, scales = "free_y") + 
  labs(color = "Spherical\ncoordinate", y = TeX("Measurement"), x = "Exercise time [s]",
       title = "Walking accelerometry data collected at left wrist")

Example: arm swing movement range, expressed in angle degrees

## Azminuth values range in first 4 seconds
azimuth_vals <- 
  plt.df %>%
  filter(variable == "azimuth", time_s_act < 4) %>%
  pull(value) 
## Azminuth values range: conversion of radians to degrees
range(azimuth_vals) * 180 / pi
## [1] -135.88140  -30.85325
abs(diff(range(azimuth_vals) * 180 / pi))
## [1] 105.0282

Aggregate summary of data: vector magnitude count

In reality, it is often challenging to make a plot of all data points collected at sampling frequency of 100 Hz even in 50 minutes-long time-series.

A one way to summarize accelerometry data of such high density is to use a \((vmc)\) - vector magnitude count (also known as the mean amplitude deviation). For \(\overline{vm}(t,H)\) - average of \((vm)\) time-series over time window of length \(H\) starting at time \(t\), we define \[\mathrm { vmc } ( t, H ) = \frac { 1 } { H } \sum _ { h = 0 } ^ { H - 1 } \left| vm ( t + h ) - \overline{vm}(t,H) \right|.\]

Compute \(vmc\) for data collected at wrist from whole ~ 50 minutes of data. Use \(H = 3s\) window.

df.sub <- 
  df.all %>% 
  filter(loc_id == "left_wrist") %>%
  arrange(time_s)

## Function to compute vmc from a vm window vector
vmc <- function(vm.win){
  mean(abs(vm.win - mean(vm.win)))
}

## Compute vmc vector in 3-seconds windows 
vm <- df.sub$vm
## Vector length of 3-seconds data window given data collection frequency 100 Hz 
hz <- 100
win.vl <- hz * 3  
rn.seq <- seq(1, to = length(vm), by = win.vl)

vmc.vec <- sapply(rn.seq, function(rn.i){
  vm.win.idx <- rn.i : (rn.i + win.vl - 1)
  vm.win <- vm[vm.win.idx]
  vmc(vm.win)
})

vmc.df <- data.frame(
  vmc = vmc.vec,
  time_s = df.sub$time_s[rn.seq],
  rn_seq = rn.seq,
  activity_lab = df.sub$activity_lab[rn.seq]) %>%
  filter(activity_lab != "non_study_activity") %>%
  mutate(activity_lab = factor(activity_lab, levels = activity_lab.level))

ggplot(vmc.df, aes(x = time_s / 60, y = vmc, color = activity_lab)) + 
  geom_point(alpha = 0.7) + 
  labs(x = "Experiment time [s]", y = "Vector magnitude count",
       title = "Vector magnitude count (in 3-sec windows) over the experiment time",
       color = "Activity\nlabel") + 
  theme_minimal()

Work already published with this data set

ADEPT pattern-matching method for precise walking stride segmentation from data

Automatic car driving detection algorithm

  • Straczkiewicz, M., Urbanek, J.K., Fadel, W.F., Crainiceanu, C.M., Harezlak, J. Automatic car driving detection using raw accelerometry data (2016) Physiological Measurement.

    • Proposed a Detection Algorithm of Driving via Accelerometry (DADA), designed to detect time periods when an individual is driving a car.

    • The methodological approach is based on short-time Fourier transform (STFT) applied to the raw accelerometry data and identifies and focuses on frequency vibration ranges that are specific to car driving.

Differentiating Between Walking and Stair Climbing

  • Fadel, W.F., Urbanek, J.K., Albertson, S.R., Li, X., Chomistek, A.K., Harezlak, J. Differentiating Between Walking and Stair Climbing Using Raw Accelerometry Data (2019) Statistics in Biosciences.

    • Classification method for the detection of walking and its subclasses (level walking, descending stairs, and ascending stairs).

    • Evaluate the effects of sensor location and tuning parameters on the classification accuracy of the tree models.

Class project ideas

Room for exploration: associations between features of accelerometry data and demographics info

  • Association of features of walking and participants demographics projects.

    • Areas of exploration: vector magnitude count (\(vmc\)), frequency domain analysis of walking time-series, ADEPT-segmented strides variability (stride duration time variability, stride pattern variability).

  • Prediction/classification projects.

    • Can we predict age/sex/BMI of participant based on walking data? Based on driving data?

    • Can we predict walking subtype (level walking, ascending stairs, descending stairs) purely from data? What if demographics are added?

References

[1]: Fadel, W.F., Urbanek, J.K., Albertson, S.R., Li, X., Chomistek, A.K., Harezlak, J. Differentiating Between Walking and Stair Climbing Using Raw Accelerometry Data (2019) Statistics in Biosciences, 11(2), 334–354. doi: 10.1007/s12561-019-09241-7

[2]: Straczkiewicz, M., Urbanek, J.K., Fadel, W.F., Crainiceanu, C.M., Harezlak, J. Automatic car driving detection using raw accelerometry data (2016) Physiological Measurement, 37(10), 1757–1769.

[3]: Karas, M., Bai, J., Strączkiewicz, M., Harezlak, J., Glynn, N. W., Harris, T., … Urbanek, J. K. (2019). Accelerometry Data in Health Research: Challenges and Opportunities. Review and Examples. (2019). Statistics in Biosciences, 11, 210–237.

[4]: Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation (2019) Biostatistics.